Speech Recognition with Hierarchical Codebook Search

نویسنده

  • Tobias Bühler
چکیده

3 1 Specifications 4 1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 The Decimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Preconditions 6 2.1 Distortion and Distance Measurement . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Random Generated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Number of Initial Codebook Vectors . . . . . . . . . . . . . . . . . . . . . . . 7 3 Solution Process 7 4 Program Description 8 4.1 Training, Test and Codebook Vector Generation . . . . . . . . . . . . . . . . . 8 4.2 Hierarchical Codebook Generation with Decimation . . . . . . . . . . . . . . . 9 4.3 Hierarchical Codebook Generation with LBG . . . . . . . . . . . . . . . . . . 11 4.4 Test and Quantization Programs . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Evaluation 11 5.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1.1 Equality of a Vector Quantization with a Full and a Hierarchical Codebook 12 5.1.2 Degenerated Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 Evaluation for Different Values of Parameter ft . . . . . . . . . . . . . . . . . 16 5.3 Evaluation for Different Values of Parameter B . . . . . . . . . . . . . . . . . 18 5.4 Application to a Speech Recognition Task . . . . . . . . . . . . . . . . . . . . 21

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fast search method of speaker identification for large population using pre-selection and hierarchical matching

Performance of search during matching phase in a speaker identification system realized through vector quantization (VQ) is investigated in this paper. Voice of each person is recorded in a office room with personal computers. LPC−cepstrum is selected as feature vector. In order to gain higher success rate of identification, it is necessary to use larger size codebook for each person. Consequen...

متن کامل

Learning of Invariant Object Recognition in a Hierarchical Network

In this paper we propose an object recognition system implementing three basic principles: forming of temporal groups of features, learning in a hierarchical structure and using feedback for predicting future input. It gives very good results on public available datasets. Precondition for successful learning is that training images are presented to the system in an appropriate order such that i...

متن کامل

A study on LVCSR and keyword search for tagalog

We describe a state-of-the-art large vocabulary continuous speech recognition (LVCSR) and keyword search (KWS) system trained on roughly 70 hours of conversational telephone speech. Using the Kaldi speech recognition toolkit, we investigate several aspects: for the acoustic front-end, we analyze the use of mel-frequency cepstral coefficients (MFCC), pitch and probability-of-voicing (PoV), and d...

متن کامل

Robust Speech Recognition by DHMM with A Codebook Trained by Genetic Algorithm

This paper uses genetic algorithms to train a codebook for the modeling of Discrete Hidden Markov Model (DHMM) applied to speech recognition. The GA-trained DHMM is then used to increase the recognition rate for Mandarin speeches. Vector quantization based on a codebook is a fundamental process to recognize the speech signal by DHMM. A codebook will be first trained by genetic algorithms throug...

متن کامل

Fuzzy Vector Quantization on the Modeling of Discrete Hidden Markov Model for Speech Recognition

This paper applies fuzzy vector quantization (FVQ) to the modeling of Discrete Hidden Markov Model (DHMM) and then to improve the speech recognition rate for the Mandarin speech. Vector quantization based on a codebook is a fundamental process to recognize the speech signal by DHMM. A codebook will be first trained by K-means algorithms using Mandarin training speech. Then, based on the trained...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014